Annotating Chinese Collocations with Multi Information
نویسندگان
چکیده
This paper presents the design and construction of an annotated Chinese collocation bank as the resource to support systematic research on Chinese collocations. With the help of computational tools, the bi-gram and n-gram collocations corresponding to 3,643 headwords are manually identified. Furthermore, annotations for bi-gram collocations include dependency relation, chunking relation and classification of collocation types. Currently, the collocation bank annotated 23,581 bigram collocations and 2,752 n-gram collocations extracted from a 5-million-word corpus. Through statistical analysis on the collocation bank, some characteristics of Chinese bigram collocations are examined which is essential to collocation research, especially for Chinese.
منابع مشابه
Annotating Information Structures In Chinese Texts Using HowNet
This paper reported our work on annotating Chinese texts with information structures derived from HowNet. An information structure consists of two components: HowNet definitions and dependency relations. It is the unit of representation of the meaning of texts. This work is part of a multi-sentential approach to Chinese text understanding. An overview of HowNet and information structure are des...
متن کاملThe Identification and Classification of Unknown Words in Chinese An N-Grams-Based Approach
In this paper, we propose a new approach to identify unknown words in Chinese. This approach adopts an n-grams program to sort out the collocating word / character sequences which are possible words and phrases in Chinese. In addition to proposing the criteria for identifying Chinese new words, was also classify these new words according to their structural and semantic characteristics. The cor...
متن کاملAutomatic Extraction of English Collocations and their Chinese - English Bilingual Examples : A Computational Tool for Bilingual Lexicography
This paper describes the procedures involved in developing EXEC, a web-based system which can automatically extract English collocations and their Chinese-English bilingual examples from parallel corpora. The system draws on statistics, dependency parsing, and Chinese-English parallel corpora of more than 13 million English words and 27 million Chinese characters. By taking a word as well as th...
متن کاملSupervised Learning Algorithms Evaluation on Recognizing Semantic Types of Spanish Verb-Noun Collocations
The meaning of such verb-noun collocations as the wind blows, time flies, the day passes by can be generalized as ‘what is designated by the noun exists’. Likewise, the meaning of make a decision, provide support, write a letter can be generalized as ‘make what is designated by the noun’. These generalizations represent the meaning of certain groups of collocations and may be used as semantic a...
متن کاملCollocation and Trillocation
In this paper we proposed that the neglected three words collocations (trillocation) should be emphasized in collocation study. From the point of view of colligations, more useful collocations could be covered by adding a third category. For a specific third word, it will help avoid the unnaturalness of a two words collocation. A statistic based automatic trillocation extracting system is propo...
متن کامل